A Method for Cross-Language Retrieval of Chunks Using Monolingual and Bilingual Corpora

نویسندگان

  • Tayebeh Mosavi Miangah
  • Amin Nezarat
  • H. H. Chen
چکیده

Information retrieval (IR) is a crucial area of natural language processing (NLP) and can be defined as finding documents whose content is relevant to the query need of a user. Cross-language information retrieval (CLIR) refers to a kind of information retrieval in which the language of the query and that of searched document are different. One of the fundamental issues in bilingual retrieving of information in search engines seems to be the way and the extent users call for phrases and chunks. The main problem arises when the existing bilingual dictionaries are not able to meet the users actual needs for translating such phrases and chunks into an alternative language and the results often are not reliable. In this paper it has been tried to report the findings extracted from an experiment carried out in this respect to deal with this problem. In this project a heuristic method for extracting the correct equivalents of source language chunks using monolingual and bilingual linguistic corpora as well as text classification algorithms is to be introduced. For this purpose we use a statistical measure known as Association Score (AS) to compute the association value between every two corresponding chunks in the corpus. The results gained from the experiment carried out in this respect to examine the effectiveness of the heuristic method on extracting all possible chunks in Persian language and finding the most appropriate equivalents for them in English are very encouraging.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Language-Independent Approach to European Text Retrieval

We present an approach to multilingual information retrieval that does not depend on the existence of specific linguistic resources such as stemmers or thesaurii. Using the HAIRCUT system we participated in the monolingual, bilingual, and multilingual tasks of the CLEF-2000 evaluation. Our method, based on combining the benefits of words and character n-grams, was effective for both language-in...

متن کامل

Deriving a Bilingual Lexicon for Cross Language Information Retrieval

In this paper we describe a systematic approach to derive a bilingual lexicon automatically from paral lel corpora Following this approach a lexicon was derived from the English and Dutch version of the Agenda corpus With the lexicon and a part of the corpus that was not used to derive the lexicon a bilingual retrieval environment was build Recall and precision of monolingual Dutch retrieval wa...

متن کامل

XRCE Participation in CLEF 2002

In this paper, we describe the methods we used for the Cross-Lingual Evaluation Forum CLEF 2002, and more specifically for the GIRT Task. The methods are based on (1) the extraction of two bilingual lexicons, one from parallel corpora and the other one from comparable corpora, (2) the optimal combination of these bilingual lexicons in Cross-Language Information Retrieval and (3) the combination...

متن کامل

Arabic/English Cross Language Information Retrieval Using a Bilingual Dictionary

With the increase of multilingual information available online and the increase of non-native English speaker (Arabic users) browsing the Internet, it has become more important to have information retrieval systems that can carry the retrieval process across language boundaries that is, cross language information retrieval CLIR systems. The CLIR system responds to the user query in a comprehens...

متن کامل

Comparison of the Dimensions of Executive Functions in Monolingual and Bilingual Children

Objective: This study aimed to compare the executive functioning between bilingual and monolingual children. Methods: We recruited a total of 200 children, all under 5-years old, who participated in a cross-sectional study. These participants were separated into two groups based on their enrollment in a second language program. Group one consisted of children enrolled in a second language prog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010